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Abstract 


The MODIS Level-3 optical thickness and effective radius cloud product is a gridded 
l°xl° dataset that is derived from aggregation and subsampling at 5 km of 1 km 
resolution Level-2 orbital swath data (Level-2 granules). This study examines the impact 
of the 5 km subsampling on the mean, standard deviation and inhomogeneity parameter 
statistics of optical thickness and effective radius. The methodology is simple and 
consists of estimating mean errors for a large collection of Terra and Aqua Level-2 
granules by taking the difference of the statistics at the original and subsampled 
resolutions. It is shown that the Level-3 sampling does not affect the various quantities 
investigated to the same degree, with second order moments suffering greater 
subsampling errors, as expected. Mean errors drop dramatically when averages over a 
sufficient number of regions (e.g., monthly and/or latitudinal averages) are taken, 
pointing to a dominance of errors that are of random nature. When histograms built from 
subsampled data with the same binning rules as in the Level-3 dataset are used to 
reconstruct the quantities of interest, the mean errors do not deteriorate significantly. The 
results in this paper provide guidance to users of MODIS Level-3 optical thickness and 
effective radius cloud products on the range of errors due to subsampling they should 
expect and perhaps account for, in scientific work with this dataset. In general, 
subsampling errors should not be a serious concern when moderate temporal and/or 
spatial averaging is performed. 
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I. Introduction 

In order to study the global distribution of cloud properties and the main features of their 
monthly, seasonal and diurnal evolution, in other words, in order to examine cloud 
climatology, a gridded set of spatially-averaged cloud retrievals is the most appropriate. 
Such a product is provided for the MODIS instrument aboard the EOS Terra and Aqua 
platforms as Level-3 MOD08* (Terra) and MYD08* (Aqua) datasets [1]. There are 
actually three Level-3 MODIS cloud products available for each platform. Statistics are 
summarized over a l°xl° global grid for daily (D3), eight-day (E3), and monthly (M3) 
time scales. Each of the Level-3 products contain statistics generated from the Level-2 
(Orbital Swath) products. Statistics for a given derived quantity or Science DataSet 
(SDS) might include: simple (mean, minimum, maximum, standard deviation) statistics; 
parameters of normal and lognormal distributions; fraction of pixels that satisfy some 
condition (e.g. cloudy, clear); histograms of the quantity within each gridpoint; 
histograms of the confidence placed in the retrieved quantity; histograms and/or 
regressions derived from comparing one science parameter to another; statistics 
computed for a subset that satisfies some condition [1]. All these statistics are computed 
by subsampling pixel-level values every 5 km since the geolocation internal to the 
MOD06 (Level-2) cloud product is 5 km [1]. Thus, cloud statistics for an overcast l°xl° 
gridpoint around the equator come from about ~480 pixels instead of the ~ 12,000 1-km 
pixels that are originally contained within the gridpoint. The subject of this study is to 
examine whether this subsampling has distorting effects on several Level-3 SDSs and on 
some quantities of interest derived from them. This is obviously an important issue for 
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current and future users of the Level-3 cloud dataset who intend to compare MODIS 
cloud climatologies with those from other sources. 

The outline of the paper is as follows: First, in Section II. we present the dataset 
used to examine the subsampling effect, the SDSs and other quantities we are interested 
in, and discuss the methodology for analyzing the subsampling errors. In section HI we 
present results for optical thickness statistics, and in section IV for effective radius 
statistics. Section V, examines whether the findings in sections HI and IV are affected 
when the quantities of interest are derived from histograms built by following the Level-3 
binning rules for optical thickness and effective radius. The final section consists of an 
overview discussion on our findings and their implications for users of MODIS Level-3 
■ cloud climatologies. 

II. Dataset and methodology 

We use 300 Level-2 granules obtained for various post-2000 November months around 
~40° N for both Terra (200 granules) and Aqua (100 granules). Each granule has 2030 
pixels along track and 1354 lines of pixels across track. For those pixels identified as 
cloudy from the cloud masking algorithm [2], the cloud phase is determined (liquid, ice, 
undetermined) and subsequently cloud optical thickness, r, and cloud effective radius, r eff , 
(ratio of the third to the second moment of the cloud particle radius distribution) is 
retrieved (among others) [3]. The retrievals used here come from the 0.65 pan (over land) 
and 0.86 pim (over ocean) bands that are the most sensitive to changes in cloud optical 
thickness, in conjunction with the 2.1 pim band. which is most sensitive to changes in 
cloud particle size [3], Here, the pixel-by-pixel phase determination for our dataset will 
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be largely ignored since it is not an essential factor in sampling error estimations as will 
become evident later. The only time phase enters our discussion is in section V where, 
due to different histogram binning rules between the two phases, all cloudy pixels are 
assumed to be of one or the other phase. 

In the Level-3 dataset, the statistics of each l°xl° gridpoint have been derived from 
aggregation and subsampling at 5 km of approximately (near the equator) 1 10x1 10 pixels 
with a nominal resolution of 1 km. However, the available number of pixels to be 
subsampled approximately decreases with the cosine of latitude as one moves poleward. 
For example, at ~83°, each l°xl° gridpoint is made of ~1600 1 km pixels. Thus, the 
number of pixels used to construct the Level-3 statistics can potentially become quite 

small, especially when only a fraction of the gridpoint is cloudy (as is often the case). We 

c 

have to therefore examine the impact of the varying number of pixels used to construct 
Level-3 statistics in our analysis. 

Our approach is the following: We divide our granules in 110x110, 100x100... 
40x40 pixel regions (i.e., 8 regions sizes). Since one of the main goals is to examine the 
effects of subsampling on the cloud optical thickness inhomogeneity climatology 
presented in another paper in preparation (Oreopoulos and Cahalan 2004), we keep, as in 
that work, only regions with cloud fraction (fraction of pixels with non-zero optical 
thickness) greater than 0.1. For each of these regions (e.g. ~53,000 regions of 110x110 
pixels), we calculate for our optical thickness analysis: cloud fraction (CF), spatial mean 
of optical thickness f, standard deviation of optical thickness o x , and the inhomogeneity 
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where y = lnr — lnr. The first two equations provide two different ways to estimate the 
shape parameter of a gamma distribution which has been found to describe well observed 
distributions of cloud optical thickness [4], [5]. The first equation is for the Method Of 
Moments (MOM), and the second is an empirical approximation for the Maximum 
Likelihood Estimate (MLE) method which gives a shape parameter less sensible to 
outliers [6], The third equation is the definition of the inhomogeneity parameter of 
Cahalan et al. [7] which approximates the factor by which f should be multiplied to 
recover the mean albedo of a region. For the effective radius analysis we calculate mean 
and standard deviation of effective radius. 

For both optical thickness and effective radius we calculate the quantities described 
above in two ways: 1) by using all the cloudy pixels within the region; and 2) by using 
only every 5 th pixel along both spatial directions, if it happens to be cloudy. We then 
calculate the percentage difference of the values obtained from the above two methods: 
this gives the impact of the subsampling as a percentage error (positive signifies that 
subsampling underestimates). We ignore cloud phase in this procedure, so the means and 
standard deviations calculated correspond the closest to their counterpart SDSs for 
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“Combined Optical Thickness” and “Combined Effective Radius” in the Level-3 MODIS 
products. 

The analysis shown in the following also accounts for the fact that, at most times, 
we are not interested in the error of a single region, but in the error of an ensemble of 
regions. For example, in the work by Oreopoulos and Cahalan under preparation, the 
authors are interested in the climatology of x and v, so they ex amin e monthly, zonal, and 
global averages of these quantities. The mean error of an ensemble of 30 regions can then 
be thought of as the mean monthly sampling error for a single l°xl° gridpoint. Similarly, 
the error for an ensemble of 90 regions can be thought of as the mean seasonal sampling 
error of a single gridpoint, the error of an ensemble of 360 regions as the mean annual 
sampling error of a single gridpoint or the daily error of a latitude zone, and the error for 
an ensemble of 10,000 regions (—30x360) as the mean monthly subsampling error of a . 
latitude zone. To examine these “climatological” errors, we construct 1000 ensembles of 
regions with each ensemble obtained by combining in a random fashion a prespecified 
number of regions (1, 30, 90, 360, 10,000) for each of our 8 region sizes (5000 ensembles 
for each region size, i.e., 1000 consisting of 1 region, 1000 consisting of 30 regions, etc.). 
We can subsequently examine the distribution of errors for these 40,000 ensembles. 

III. Optical thickness errors 

Figure 1 shows the errors of subsampling (in %) of f and cr x for all (-53,700) 110x110 
regions of our dataset (except for those whose errors fall outside the ±50% bounds of the 
plot). We see that the errors for individual regions are often quite large, although the 
greatest concentration of points is within the ±20% error bounds. There is about the same 
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number of regions with positive and negative errors in f , and the same applies for 
o x . This is a good indication of the random nature of these errors. For most regions 
(~76.6% of the regions) overestimates in r by subsampling are accompanied by 
overestimates in o x and vice-versa (upper-right and lower-left quadrants), but the number 
of regions where the error is of opposite sign is substantial. The top panel of Fig. 2 shows 
a similar graph, but this time for CF and %. The errors this time are in general smaller 
with the densest concentration of points restricted to the ±10% error bounds. The number 
of regions on each quadrant is now distributed more evenly than in the previous figure. 
The bottom panel shows the % errors in % of each region- as a function of the cloud 
fraction of the region at the original resolution, and indicates that the distribution of % 
errors tightens around smaller values as the cloud fraction increases. 

Figure 3 shows the mean error for 110x110 regions as a function of cloud fraction. 
Each value was obtained by averaging the errors of regions that have cloud fraction 
within the predetermined 0.1-width bin. Note that the last bin has by far the most values 
consistent with the well-known {/-shape behavior of cloud fraction distributions. This 
figure shows prominently the dramatic effect of averaging a large number of random 
errors: the mean errors of ensembles of ~5,000 regions and above are very small, with the 
exception of v M0M at small cloud fractions. The larger impact of sampling on v M0M 
compared to the other two inhomogeneity parameters can be easily explained: both x and 
v mle depend on the linear mean and the mean logarithm of optical thickness, the former 
being simply the ratio exp(lnr)/r, and the latter being a function of the difference 
lnf-lnr; since subsampling affects both means in the same way, i.e., both are either 
overestimated or underestimated for a certain region, the aforementioned ratio and 
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difference may not change much after subsampling, in other words, there is error 
cancellation. On the other hand, for 23.4% of 110x110 regions (Fig. 1), subsampling has 
opposite effects on r and a r ; when this happens, the value of v MOM (eq. la) from 
subsampled data will tend to diverge strongly from the value before subsampling. Since 
the number of regions for which this happens is significant, the effects will linger even 
after significant averaging of percentage errors. Between and the latter is less 
affected by subsampling. There are two reasons for this. First, x is defined simply as the 
ratio of two quantities affected in a similar manner by subsampling (eq. lc) ; v MU; is a 
more complex function of the linear mean and mean logarithm difference (eq. lb), and is 
therefore subject to error propagation. Second, x has an upper bound of 1, by definition, 
while v MLF (and, of course, v MOM ) can grow without bounds. Despite the fact that we 
exclude regions with or v M0M greater than 40 in our analysis to mitigate the effect of 
these pathological cases, some residual impact from regions with large v MOM > where its 
value can change rapidly by subsampling, remains. Thus, the unbounded nature of v M0M 
is responsible for the apparent paradox that some of the most homogeneous regions may 
potentially be the ones suffering from the greatest percentage subsampling errors with 
respect to this parameter. 

Further evidence of the beneficial effects of averaging errors over a group of 
regions is shown in Fig. 4. These percentage errors of x and are for 1000 ensembles 

each consisting of 30- and 10,000- regions. The size of each region in these randomly 
constructed ensembles is 110x110 pixels. As discussed in section 2, the mean error of an 
ensemble of 30 regions is meant to represent typical monthly average errors of individual 
gridpoints, while the mean error of an ensemble of 10,000 regions approximates typical 
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monthly-average errors of latitude zones. The mean error of 30-region ensembles almost 
always stays within ±2% for % and within ±10% for v MLE . The mean errors of ensembles 
consisting of 10,000 regions are much smaller than the 30-region ensembles and cluster 
within a very small range of values. This is not surprising since each of the 10,000-region 
ensemble, even if constructed randomly, contains many common regions with the other 
ensembles because the population from which it is drawn is only larger by an 
approximate factor of 5 (there are ~53,700 110x110 regions on the dataset that satisfy our 
criteria). It is also interesting that the mean errors of 10,000-region ensembles are always 
positive for x- This is because of the tight range of x errors and the fact that there is a 
slightly larger number of regions with positive errors (Fig. 2, top, indicates that 52.7% of 
regions have positive errors). On the other hand, because of the wider range of v MLE 
errors, there are both positive and negative mean errors for 10,000-region ensembles. The 

nnsitivp dnminafp Hup. tn flip larcrpr frnr-tinn nf nneitivp prrnrs for individual rpvione 

X . - 0 — x -~C>- 

(-55%). 

Another way to assess the errors of subsampling on optical thickness statistics is 
shown in Fig. 5. The top panel shows the bounds of percentage errors than contains 95% 
of the 1000 ensembles, for ensembles consisting of a variety of region numbers (each of 
110x110 pixel size) as indicated in the abscissa. For example, the top panel of Fig. 5 
indicates that 95% (950) of 90-region ensembles have mean errors of v M0M within ±5.2% 
(3 rd point of topmost curve). CF and % have the smallest error bounds that contain 95% of 
the ensembles, followed by r, Vmle, and v M0M . For ensembles consisting of 10,000 
regions the error range that contains 95% of the ensembles is smaller than ±2% for all 
quantities (±0.25% for /!). The bottom panel of Fig. 5 shows the percentage error range 
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that contains 95% of 30-region ensembles of region size indicated in the abscissa. For 
example, 95% of 30-region ensembles have v M0M subsampling errors within ±12.95% 
when the region size is 60x60 pixels (third point of topmost curve). Because we kept the 
number of ensembles constant at 1000 for each region size (even if there are naturally 
more regions of smaller size in our dataset), it is not surprising that there is a tendency for 
the sampling error that contains 95% of the ensembles to decrease with region size. In 
other words, subsampling errors become greater for regions consisting of a smaller 
absolute number of cloudy pixels at the nominal 1 km resolution (i.e., l°xl° gridpoints at 
higher latitudes, or gridpoints with smaller cloud fractions). 

IV. Effective radios errors 

The analysis in this section follows on the footsteps of the analysis presented in the 
previous section. Case in point. Fig. 6 is the counterpart of Fig. 1 , but is now for the mean 
and standard deviation of effective radius. There are similarities with Fig. 1, such as the 
rapid decrease in the density of points outside the ±20% error range, but also differences 
such as the stronger dominance of positive errors for both the mean and the standard 
deviation. Indeed, only 21.3% of 110x110 regions have negative errors in the mean, and 
33.3% have negative errors in the standard deviation. This explains the lack of negative 
errors when averaging over a larger number of regions, as we do in Fig. 7 and 8, which 
are similar to the previous Figs. 3 and 4 (also. Fig. 9 is analogous to Fig. 5). Fig. 7 
suggests that mean errors of subsampling for mean effective radius are slightly greater 
than those for mean optical thickness, while somewhat unexpectedly the error in standard 
deviation does not improve with cloud fraction (although it improves with region size as 
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shown in the bottom panel of Fig. 9). Also, there seems to be resistance in reducing the 
mean errors below 2% even when ensembles consist of 10,000 regions (Fig. 8, and top 
panel of Fig. 9). 

All this points to systematic biases in the statistics of effective radius when 
subsampling is performed: apparently, subsampling yields frequent systematic 
underestimates of both the mean and the standard deviation of effective radius, i.e., errors 
are not always random. This curious phenomenon was further explored by examining 
effective radius histograms retrieved from unsampled and subsampled data. Indeed, when 
the normalized frequency distribution of combined effective radius (i.e., both liquid and 
ice clouds) was plotted for the unsampled data from all 300 granules (not shown) with a 1 
y.m bin resolution, four peaks were observed: one narrow for the 3-4 jtm bin, one wide 
between 8 and 12 pim, one narrow for the 29-30 pim bin, and one more narrow in the 58- 
59 urn bin. The first two peaks are definitely liquid cloud peaks, the third coincides with 
the upper limit of liquid cloud droplet effective radius, and must therefore contain both 
liquid and ice particles, and the fourth peak is an ice cloud peak. While the 1 km and 5 
km histograms agree overall, there are small, but noticeable differences in those peaks: 
for the first two peaks, there is a larger normalized frequency for the sub sampled 
retrievals, while the opposite happens for the last two peaks. These differences are large 
enough to result in systematically smaller effective radii for the subsampled data in the 
majority of regions to which we divide the granules. They also lead to somewhat wider 
histograms for the unsampled data which explains the tendency for positive standard 
deviation subsampling errors. Further separate analysis of the 200 Terra granules and the 
100 Aqua granules showed that the distinct differences previously described between 
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original and subsampled histograms appeared only in the Terra effective radius 
histograms (not shown); the counterpart histograms for optical thickness were virtually 
indistinguishable for both platforms. When Terra effective radius histograms were then 
constructed separately (not shown) for retrievals corresponding to different pairs of 
detector elements (the 2.1 fim band has 20 detector elements each of 500 m resolution for 
a total viewing path of 10 km along track, so for the 1 km effective radius Level-2 
product measurements from 2 detectors are aggregated), one of the histograms stood out 
as having characteristics such as those described above for the ensemble histogram of 
subsampled data. This histogram was from the detector pair that yielded lines 1, 11, 21, 
31, etc. of the granule which were always included in the subsampled dataset. Thus, a 
source of bias errors can appear in subsampled Level-3 data if pixel lines with 
systematically different radiative characteristics (and therefore systematically different 
retrievals) than the other lines are always selected by the subsampiing algorithm. This is 
what happened in our case, and while the bias error is small in magnitude, it was still 
easily detected by the subsampling analysis. 

V. Errors from histograms 

The MODIS Level-3 cloud product also includes SDSs that are histograms of cloud 
optical thickness and effective radius. These are also constructed from subsampled data. 
Although the statistical quantities and parameters we examined here are either given 
directly as distinct SDS products (r, cr T , lnr) or can be trivially derived from them using 
eq. (1) (x, v MOM , Vyn F ), it would be of interest to obtain an assessment of the errors if the 
quantities of interest are obtained from the histogram SDSs. 
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The three moments, r, cr x , lnr, that are needed for eq. (1) are derived from the 
discrete probability distribution function p(r) built from the histograms for each region 
(of the 8 regions sizes) using 5 Ian subsampled data, as follows: 
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Analogous relationships apply for effective radius. The number of bins N varies 
according to the type of histogram, and the values we used for each case are given below. 

It should be underlined that the subsampling error is defined in this case as the 
difference between the value of the desired quantity calculated from the nominal 1 km 
data directly (i.e., not from histograms constructed with 1 km data) and the value derived 
using eq. (2), i.e., from histograms of subsampled 5 km data. 

Figure 10 is for optical thickness and is analogous to the bottom panel of Fig. 5. 
The top panel is for calculations using MODIS Level-3 binning for liquid clouds ( N = 45 
bins) and the bottom is for calculations using ice cloud binning ( N = 30 bins). Both 
histograms extend up to a value of 100 for optical thickness, but the width of the bins is 
different (the ice histograms better resolve small values of optical thickness and are 
coarser for large values). Results for both panels of Fig. 10 look similar to the results in 
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the bottom panel of Fig. 5, except for the v ^p error for liquid cloud histogram binning 
which is worse for most region sizes from its counterpart for v M0M . 

Figure 1 1 is for effective radius and is analogous to the bottom panel of Fig. 9. The 
top panel is for calculations using histogram b innin g for liquid clouds and the bottom is 
for calculations with ice cloud binning. The former originally uses N = 23 bins in the 
Level-3 dataset, extending from 2 to 30 /an, but a 24* very wide bin was added from 30 
to 60 jim to accommodate the large particle effective radii encountered in our dataset. 
The latter uses N-12 histogram bins extending from 6 to 60 jim. Again, there is little 
difference from what has already been shown in Fig. 9, with the exception of the error in 
standard deviation when the liquid cloud histogram binning is used. This is probably the 
result of the coarse last bin that was arbitrarily added. Results with ice cloud binning do 
not seem to be much affected by neglect of particle sizes below 6 jim. 

In conclusion, for monthly or longer time scales, one can reconstruct cloud optical 
thickness or effective radius moments, or optical thickness inhomogeneity parameters 
from MODIS Level-3 histograms (built from subsampled 5 km data) for a l°xl° region, 
without suffering much additional subsampling error relative to the case where the 
moments and parameters come from distinct Level-3 SDSs. 

VI. Summary and conclusions 

Cloud optical thickness and effective radius Scientific Datasets (SDSs) in the MODIS 
Level-3 daily, eight-day, and monthly products come from aggregation on a l°xl° grid of 
Level-2 orbital swath data that have been subsampled at 5 km. This study has examined 
the impact of this subsampling on cloud fraction, the mean and standard deviation of 
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optical thickness and effective radius, as well as on parameters that convey the radiative 
impact of the variability of optical thickness. As a measure of the subsampling effect we 
use the percentage difference between unsampled and subsampled results for ensembles 
of regions with size on the order of l°xl°. The unsampled data come from 300 Terra and 
Aqua granules obatined at ~40°N for several post- 2000 November days. 

It was shown that Level-3 subsampling does not affect the various quantities 
investigated to the same degree, with second order moments and quantities depending on 
second order moments suffering greater subsampling errors, as expected. For individual 
regions consisting of 110x110 pixels the vast majority of regions have errors within 
±20% for mean and standard deviation of optical thickness and effective radius. Errors 
for cloud fraction and the inhomogeneity parameter x are smaller, and errors for the 
inhomogeneity parameters v M0M and v MLE are greater (especially for v MOM ). Mean errors 
drop dramatically when averages over a sufficient number of regions (e.g., monthly 
and/or latitudinal averages) are taken: for ensembles of 30 regions (corresponding to 
monthly averages) errors for most regions sizes are less than 15% for v M0M and v MLE 95% 
of the time, while for the other quantities' they are in generally below 5%. Subsampling 
errors seem to be mostly of random nature, but there was evidence that small but 
systematic underestimates may be occuring for effective radius mean and standard 
deviation. We traced this back to systematic differences in the retrievals from different 
2.1 pim band detectors: the subsampling procedure was systematically picking a pixel line 
(from the fust two detectors) which had radiatively different appearance from the other 
pixel lines. Finally, when histograms built from subsampled data with the same binning 
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as in the Level-3 dataset are used to reconstruct the quantities of interest, the mean errors 
at monthly scales do not deteriorate significantly. 

It may be worth mentioning that subsampling error analysis was also performed 
with the 2D bounded cascade model of Marshak et al. [8] which offers the advantage that 
the properties of clouds (cloud fraction, degree of inhomogeneity, mean optical 
thickness) can be easily controlled. Optical thickness errors due to subsampling from 
model clouds largely mirrored those derived from MODIS. The ranking of parameters 
according to error magnitude is the same (% exhibits the smallest errors and v M0M the 
largest), the error decreased with cloud fraction and cloud homogeneity, and exhibited 
rapid decline when averaged over ensembles of randomly generated cascades fields. 

The results in this paper provide guidance to users of MODIS Level-3 cloud 
products on the range of errors due to subsampling they should expect and perhaps 
account for, in scientific work with this dataset. Although the findings may be to some 
extent specific to the type of clouds encountered in our granules which come from a 
relatively limited geographical location and are for a particular month of the year only, it 
would probably be safe to conclude that subsampling errors should not be a serious 
concern for individual gridpoints of MODIS Level-3 eight-day (E3) and monthly (M3) 
data, or D3 (daily) data that have undergone moderate additional temporal averaging, or 
for spatial averages (e.g., zonal averages). A study of the type shown here, but with a 
global dataset and more SDSs would be even more robust statistically and would give a 
more definitive answer on the impact of MODIS Level-3 subsampling. 
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List of figures 

Figure 1 Subsampling error in the mean and standard deviation of optical thickness (both 
in %) for each 110x110 pixel region of our dataset. The numbers in the comers are the 
percentage of regions with errors fallong into each of the four quadrants. Note that there 
were few regions whose errors fell outside the axis limits (±50%) of this plot. 

Figure 2 Subsampling error in CF and x (both in %) for each 110x110 pixel region of 
our dataset (top), and subsampling error of % as a function of the actual (unsampled) 
cloud fraction of each region (bottom). 

Figure 3 Mean error (in %) for various statistics of optical thickness as a function of 
cloud fraction. The right ordinate shows the number of 110x110 pixel regions with cloud 
fraction that falls within each 0.1 -width bin (regions with cloud fractions less than 0.1 
were omitted). 

Figure 4 Mean error (in %) of % and v MLE for each of the 1000 ensembles of 30- and 
10000- 110x110 pixel regions as a function of the mean value of the ensemble obtained 
at the original 1 km nominal resolution. 

Figure 5 Top: Subsampling error range (in %) that contains 95% of the 1000 ensembles 
each made of the number of 110x110 pixel regions shown in the abscissa; bottom: as in 
top panel, but for 1000 ensembles of 30-regions of the size shown in the abscissa. 

Figure 6 As in Fig. 1, but for effective radius. 

Figure 7 As in Fig. 3, but for mean and standard deviation of effective radius. 

Figure 8 As in Fig. 4, but for mean and standard deviation of effective radius. 

Figure 9 As in Fig. 5, but for mean and standard deviation of effective radius. 
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Figure 10 As the bottom panel of Fig. 5 (save the cloud fraction), but when histograms 
from 5 km subsampled data are used to reconstruct the statistics or inhomogeneity 
parameters. Top panel shows results when the Level-3 b innin g for liquid clouds is used, 
and bottom panel when ice cloud binning is used. 

Figure 11 As Fig. 10, but for mean and standard deviation of effective radius. 
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Popular summary for “The impact of subsampling on MODIS Level-3 statistics of 
cloud optical thickness and effective radius” 
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Some of the most important parameters to study the role of clouds in climate are their 
optical thickness OT (an indicator of the amount of cloud water) and effective radius 
REFF (a measure of cloud particle size). MODIS aboard the Terra and Aqua satellite 
platforms is able to retrieve nearly global datasets of these parameters at an 
approximately 1 km resolution. While these are very useful for regional studies, climate 
studies can often be more efficiently performed by using coarser resolution datasets. For 
researchers who perform climate studies, the MODIS team provides higher-level (Level- 
3, as they are called) gridded datasets of l°xl° (roughly 100 km X 100 km) resolution 
that are averaged over 1-day, 8-day or monthly time periods. To build this dataset the 
MODIS aggregation algorithm does not use every 1 km pixel, but only every 5* pixel in 
both spatial directions, that is, it samples only 1 out of 25 pixels. This may lead to errors 
in the gridded mean values of OT and REFF relative to the case where all the pixels are 
used. The purpose of the study presented in this paper is to assess the nature, magnitude, 
and dependencies of these sampling errors. Actually, it not only examines the errors in 
the mean values of OT and REFF, but also of other related parameters that quantify the 
inhomogeneity of clouds in terms of their impact on reflected solar radiation. The general 
conclusion of the study is that users of Level-3 data should not be particularly concerned 
about sampling errors since their magnitude is usually small and they are mostly of 
random nature. The latter means that the values of sampling errors can be driven further 
down if climatologies of OT, REFF and the inhomogeneity parameters are built by 
ensembles of l°xl° regions such as those used to form monthly, latitudinal, 
hemispherical or global averages or any combination of temporal and spatial averages. 


